Biostatistics For Dummies (Monika Wahi John Pezzullo)

Chapter 8

Getting Your Data into the Computer

IN THIS CHAPTER

Understanding levels of measurement (nominal, ordinal, interval, and ratio)

Defining and entering different kinds of data into your research database

Making sure your data are accurate

Creating a data dictionary to describe the data in your database

Before you can analyze data, you have to collect it and get it into the computer in a form that’s suitable

for analysis. Chapter 5 describes this process as a series of steps — figuring out what data you need

and how they are structured, creating data entry forms and computer files to hold your data, and

entering and validating your data.

In this chapter, we describe a crucially important component of that process, which is storing the data

properly in your research database. Different kinds of data can be represented in the computer in

different ways. At the most basic level, there are numerical values and classifications, and most of us

can immediately tell the two apart — you don’t have to be a math genius to recognize “age” as

numerical data, and “occupation” as categorical information.

So why are we devoting a whole chapter to describing, entering, and checking different types of data?

It turns out that the topic of data storage is not quite as trivial as it may seem at first. You need to be

aware of some important details or you may wind up collecting your data the wrong way and finding

out too late that you can’t run the appropriate analysis. This chapter starts by explaining the different

levels of measurement, and shows you how to define and store different types of data. It also suggests

ways to check your data for errors, and explains how to formally describe your database so that others

are able to work with it if you’re not available.

Looking at Levels of Measurement

Around the middle of the 20th century, the idea of levels of measurement caught the attention of

biological and social-science researchers and, in particular, psychologists. One classification scheme,

which has become widely used (at least in statistics textbooks), recognizes four levels at which

variables can be measured: nominal, ordinal, interval, and ratio:

Nominal variables are expressed as mutually exclusive categories, like country of origin (United

States, China, India, and so on), type of care provider (nurse, physician, social worker, and so on),

and type of bacteria (such as coccus, bacillus, rickettsia, mycoplasma, or spirillum). Nominal

indicates that the sequence in which you list the different categories is purely arbitrary. For

example, listing type of care provider as nurse, physician, and social worker is no more or less

natural than listing them as social worker, nurse, and physician.